# RPI implementation

## Preparation
I recommend to use docker to build the environment. The Dockerfile is provided at `rpi/docker/Dockerfile`.
Please feel free to let anonymized know if you have any questions.

You need to set envvars `WANDB_API_KEY` and `MAMBA_EXPERT_DIR` for specifying where to store & load from expert checkpoints

To train expert:
``` sh
python -m rpi.scripts.train_ppo
```

## Training
RPI training script is in `rpi/scripts/train.py`.

To launch rpi training, you need to specify a sweep file as well as its line number like:
``` bash
python -m rpi.scripts.train rpi/scripts/sweep/default.jsonl -l 0
```

You can run RPI by setting `Args.algorithm = 'rpi'` either by a sweep file 
rpi_cartpole_final_all.jsonl
rpi_cheetah_final_all.jsonl
rpi_pendulum_final_all.jsonl
rpi_walker_final_all.jsonl
rpi_cheetah_no_experts.jsonl
rpi_RAPS_final.jsonl
rpi_metaworld_button_final_sparse.jsonl
rpi_metaworld_button_final.jsonl
rpi_metaworld_drawer_final_sparse.jsonl
rpi_metaworld_drawer_final.jsonl
rpi_metaworld_faucet_final_sparse.jsonl
rpi_metaworld_faucet_final.jsonl
rpi_metaworld_window_final_sparse.jsonl
rpi_metaworld_window_final.jsonl


or directly modifying `rpi/scripts/sweep/default_args.py`.


## Running training sweep

Adjust `--sweep` option based on the number of lines in `.jsonl` file.
``` bash
$ rmx run anonymousslurm --contain --sweep 0-38 -d -- 'python -m rpi.scripts.train rpi/scripts/sweep/all/rpi_cartpole_final_all.jsonl -l $RMX_RUN_SWEEP_IDX'
```
NOTE: Single quotes (`''`) are critical.


## Generating plots
Edit `/rpi/scripts/plots/generate_plot.py`
First, you must make sure that `domain2expert_info` (L32) matches what you had in the sweep generation script.
Then change `num_seeds` in `plot_master.py` (L27) to the number of seeds used during training.

Otherwise, L183-L208 (pasted below) contains key parameters to change.
- `query` specifies the query send to wandb API.
- `group_keys` specifies how to group runs
- `group2legend` specifies the string to use in legend
- `group2color` specifies what color to use in the plot
``` python
  **{f"main-plot-{domain}-{i}": {
      "query": get_query_set(domain, expert_steps, algorithms=['rpi','lops-aps', 'mamba', 'pg-gae'], learner_pi=['all', 'rollin']),
      "xlabel": "Training step",
      "ylabel": "Best return",
      "group_keys": ["algorithm", "use_riro_for_learner_pi"],
      "ykey": "eval/best-so-far",
      "xkey": "step",
      "hbar": "expert_vals",
      "group2legend": {
          "mamba-none": "Mamba",
          "lops-aps-all": "LOPS-APS-all",
          "lops-aps-rollin": "LOPS-APS-ri",
          # "lops-aps-ase-all": "LOPS-APS-ASE",
          "pg-gae-none": "PPO-GAE"
      },
      "group2color": {
          "mamba-none": colors[0],
          "lops-aps-all": colors[1],
          "lops-aps-rollin": colors[-1],
          # "lops-aps-ase-all": colors[2],
          "pg-gae-none": colors[3]
      },
      "plot_dir": "generated/main-plot"
  } for domain in domains
      for i, expert_steps in enumerate(domain2expert_info[domain])
      },
```


We want to show the following results:

**Multi oracles**
1. RPI outperform other baselines
#dense
generate_plot_rpi_cartpole.py
generate_plot_rpi_cheetah.py
generate_plot_rpi_walker.py
generate_plot_rpi_metaworl_button.py
generate_plot_rpi_metaworl_drawer.py
generate_plot_rpi_metaworl_window.py
generate_plot_rpi_metaworl_faucet.py

#sparse
generate_plot_rpi_pendulum.py
generate_plot_rpi_metaworl_button_sparse.py
generate_plot_rpi_metaworl_drawer_sparse.py
generate_plot_rpi_metaworl_window_sparse.py
generate_plot_rpi_metaworl_faucet_sparse.py

2. RPI learning state-wise oracle expertise
generate_plot_rpi_cartpole_statewise_combination.py

3. ablation study: confidence-aware-RAPS
generate_plot_rpi_walker_lcb_ucb_ablation.py
generate_plot_rpi_carpole_lcb_ucb_ablation.py
generate_plot_rpi_cheetah_lcb_ucb_ablation.py
generate_plot_rpi_pendulum_lcb_ucb_ablation.py

4. ablation study: confidence-aware-RPG on Gamma
generate_plot_rpi_cartpole_uncertainty.py

5. baseline alignment by using MAPS setting
generate_plot_rpi_walker_original_setting.py
generate_plot_rpi_pendulum_original_setting.py
generate_plot_rpi_cartpole_original_setting.py
generate_plot_rpi_cheetah_original_setting.py



**no oracles**
6. 'RPI' performs close to "PPO-GAE"
generate_plot_rpi_pendlum_no_experts.py

**Single oracle**
7. 'RPI' could outperform other baselines when given good oracle.
generate_plot_rpi_cheetah_single_oracle.py
8. defining good or bad oracle
generate_plot_rpi_cheetah_good_bad_oracle.py

**visualization**
9. visualize actively imitation or reinforcement learning process of RPI
generate_imitate_reinforce_plot.py



